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Summary. Local minima and the saddle points separating them in the energy 
landscape are known to dominate the dynamics of biopolymer folding. Here we 
introduce a notion of a "folding funnel" that is concisely defined in terms of energy 
minima and saddle points, while at the same time conforming to a notion of a 
"folding funnel" as it is discussed in the protein folding literature. 
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' I 1 Introduction 

The dynamics of structure formation ("folding") of biopolymers, both protein 
and nucleic acids, can be understood in terms of their energy landscapes. 
Formally, a landscape is determined by a set X of conformations or states, a 
neighborhood structure of X that encodes which conformations can be reached 
from which other ones, and an energy function E : X M which assigns 
. the folding energy to each state. In the case of nucleic acids it has been 

I demonstrated that dynamics features of the folding process can be derived at 

. least in a good approximation by replacing the full landscape by the collection 

' of local minima and their connecting saddle points [1] . 

, The notion of a "folding funnel" has a long history in the protein folding 

literature [2, 3, 4, 5, 6]. It arose from the observation that the folding process 
of naturally evolved proteins very often follows simple empirical rules that 
■ seem to bypass the complexity of the vast network of elementary steps that 

, is required in general to describe the folding process on rugged energy land- 

ed ' scapes. Traditionally, the funnel is depicted as a relation of folding energy and 

"conformational entropy" , alluding to the effect that the energy decreases, on 
average, as structures are formed that are more and more similar to the na- 
tive structure of a natural protein [7]. It may come as a surprise therefore, 
that despite the great conceptual impact of the notion of a folding funnel 
in protein folding research, there does not seem to be a clear mathematical 
definition of "funnel" . Intuitively, one would expect that a funnel should be 
defined in terms of the basins and barriers of the fitness landscape (since, 



2 



Konstantin Klemm, Christoph Flamm, and Peter F. Stadler 



as mentioned above, these coarse-grained topological features determine the 

folding dynamics). Furthermore, it should imply the "funneling" of folding 
trajectories towards the ground state of the molecule. 

2 Folding Dynamics as a Mctrkov Chain 

We consider here only finite discrete conformations spaces X with a prescribed 
set of elementary moves of transitions that inter-convert conformations. In the 
following we write M{x) for the set of conformations accessible from x G X. 
For example, X = {—1, in spin-glass setting, where flipping single spins 

is the natural definition of a move. In the case of RNA or protein folding, the 
breaking and formation of individual contacts between nucleotides or amino 
acids, resp., is the most natural type of move set [8]. 

The dynamics on X is modeled as usual by the 1st order Markov chain 
with Metropolis transition probabilities 

p{x\x) = 1 - ^ p{y\x) 

yeM{x) 

All other transition probabilities are zero. 

We will be interested in the average time the system takes to reach a 
pre-defined target state xq £ X when starting at state x £ X, given by the 
recurrence 

Tx= P{y\x)Ty + p{x\x)t^ + I (2) 

j/eM(x) 

with To = (target state). 

In order to investigate the physical basis of the "funneling effect" wc start 
with a simple 1-dimensional toy model with landscapes defined over the inte- 
gers {0, . . . n}, see Fig. 1. The time r to target crucially depends on the or- 
dering of barriers. The time to target is shortest when barriers are decreasing 
towards the ground state as in panel (c) of Fig. 1. The property of decreasing 
barriers towards the ground state matches the intuition of folding funnels. In 
the following section we generalize it to arbitrary landscapes. 

3 Geometric Funnels 

A conformation a; £ X is a local minimum if Ex < Ey for all y e M{x). 
Allowing equality is a mere mathematical convenience [9]. Let P^j, be the set 
of all walks from x to y. We say that x and y are mutually accessible at level 
r], in symbols 
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Fig. 1. (a-d) Dynamics on one-dimensional energy landscapes E{x) (thick curves). 
Thin curves show the average first passage time for state xq ~ when starting from 
given state x, for temperatures /3 = (circle), /? = 1 (square), j3 = 2 (diamond), 
/3 = 3 (triangle up), /3 = 4 (triangle left). R.h.s. panel: temperature dependence of 
first passage times in the landscapes (a-d); Tmax is the average time to reach xq for 
the first time starting at the "rightmost" state x — 19. Slight changes in the slopes 
or other details of the landscapes E{x) do not change the qualitative behavior of 
Tmax as loug as the ordering of barrier heights is conserved. 



x^-Pj^y, (3) 

if there is walk p G Pxy such that < rj for all 2; G p, respectively. The 
saddle height f{x,y) between two configurations x,y G X is the minimum 
height at which they are accessible from each other, i.e., 

f{x,y) — min max_E^ = miii{ri\x ^-P_2_^ y} (4) 

The saddles between x and y are exactly the maximal points along the minimal 
paths in the equation above. We say, furthermore, that a saddle point s directly 
connects the local minima x and ?/, if there are paths p^^: and p^y from s to x 
and y, respectively, such that E is monotonically decreasing along both Ps^, 
and psy. 

For simplicity we assume the following weak non-degeneracy condition for 
our energy landscape: For every local minimum x there is a unique saddle 
point Sx of minimal height h{x) ~ miuz f(x,z). Note that Sx is necessarily a 
direct saddle between x and some other local minimum z, which for simplicity 
we again assume to be uniquely determined. This condition is stronger than 
local non-degeneracy but weaker than global non-degeneracy in the sense of 



4 Konstantin Klemm, Christoph Flamm, and Peter F. Stadler 




number of local minima 



Fig. 2. Fraction of minima belonging to the funnel vs. the total number of minima 
found in the landscape. RNA hairpins (filled squares) and RNAs with two different 
near-ground state structures (filled circles). The straight line has slope -1. Land- 
scapes with only the ground state in the funnel fall on this line. Open symbols are the 
results for the number partitioning problem with (NPP) with sizes n — 8 (circles), 
n = 10 (squares) and n = 12 (triangles). For each system size, 30 instances were gen- 
erated by drawing random numbers ai, a2, . . . , a„ and c independently from the unit 
interval. The energy for a state {xi, . . . ,x^) G {—1, 1} is E{x) = | ^i'^' + ^1- 

The c represents an extra "clamped" degree of freedom to break the symmetry un- 
der reversal of all spins. This ensures that almost all instances have a unique ground 
state. 

[9] . In the degenerate case, we consider the set of all direct saddles of minimum 
height and the set of the local minima directly connected to them. 

Now we can define the funnel of a landscape recursively as the following 
set F of states: 

(1) The ground state is contained in the funnel F. 

(2) The local minimum x belongs to the funnel if a minimum saddle 
connects directly to local minimum in the funnel F. 

(3) A state z belongs to the funnel if it is connected by a gradient descent 
path to a local minimum in F. 

Using the above definition, we can recursively partition the landscape into 
"local funnels" : Simply remove F from X and recompute the funnel of the 
residual landscape. 




Fig. 3. Funnel partitioning for the folding landscape of the RNA sequence xbix 
(CUGCGGCUUUGGCUCUAGCC). The landscape falls into three funnels. In [1] it was shown 
that a large part of the folding trajectories reach the metastable state 2 whose energy 
lies 0.8 kcal above the energy of the ground state 1. 

As one example of biopolymers we consider small artificial RNA sequences 
which have been designed either to fold into a single stable hairpin structure 
or to have two near-ground state structures that have very few base pairs in 
common. In the first case we expect landscapes dominated by funnels because 
the RNAinverse algorithm [10] tends to produce robustly folding sequences. 
In the second case we used the design procedure outlined in [11] to produce 
sequences that have decoy structures with moderate to large basins of attrac- 
tion. The sequences we use here have a length of 30nt or less, shorter than 
most structured RNAs of biological importance. 

The barriers algorithm, which efficiently computes local minima and 
their separating saddle points from an energy-sorted list of states [8] can be 
modified to compute the funnel-partitioning of the energy landscapes. We will 
report on this topic elsewhere. 

Figure 2 shows the fraction of local minima contained in the funnels of sev- 
eral landscapes. The RNA folding landscapes have folding funnels comprising 
a large part of the landscape. The landscapes of RNA sequences forming hair- 
pins have the largest funnels. In comparison wc plot the relative sizes of funnels 
for number partitioning problems of different sizes, as defined in the caption 
of Fig. 2. These artificial landscapes have significantly smaller funnels than 
the RNA folding landscapes. Thus the latter have folding funnels much larger 
than expected for random rugged landscapes. Through these large funnels the 
folding polymer may be "guided" towards the native state. 

Figure 3 shows an example of an RNA sequence with a strong kinetic trap 
studied in detail in [1]. In this landscape, a suboptimal structure has local 
funnel that covers most of the landscape, while the ground state is separated 
by comparably high barriers from almost all other local optima. 
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In summary, we have introduced here a rigorous definition of a folding 
funnel that is tractable computationally for arbitrary energy landscapes. In 
the case of RNA, where the lower fraction of the landscape can be generated 
without the need for exhaustively enumerating all configurations [12], funnels 
can be computed explicitly even for sequences that are of immediate biological 
interest. Our first computational results show that the energy landscapes of 
RNAs typically differ from the rugged landscapes of spinglass-style combina- 
torial optimization problems by exhibiting significantly larger funnels for the 
ground state. It remains to be investigated in future work whether this is also 
true e.g. for lattice protein models. A second important topic of ongoing re- 
search is the question which and to what extent evolutionary processes select 
molecules with funnel-like landscapes. 

Acknowledgments. This work was supported in part by the EMBIO project 
in FP-6 (http://www-einbio.ch.cam.ac.iik/). 
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